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Abstract 

The Pearson distance has been advocated for improving the 
error performance of noisy channels with unknown gain and off- 
set. The Pearson distance can only fruitfully be used for sets of 
q-aiy codewords, called Pearson codes, that satisfy specific prop- 
erties. We will analyze constructions and properties of optimal 
Pearson codes. We will compare the redundancy of optimal Pear- 
son codes with the redundancy of prior art T-constrained codes, 
which consist of q-aiy sequences in which T pre-determined refer- 
ence symbols appear at least once. In particular, it will be shown 
that for q < 3 the 2-constrained codes are optimal Pearson codes, 
while for g > 4 these codes are not optimal. 
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1 Introduction 


In non-volatile memories, such as floating gate memories, the data is 
represented by stored charge, which can leak away from the floating gate. 
This leakage may result in a shift of the offset or threshold voltage of 
the memory cell. The amount of leakage depends on the time elapsed 
between writing and reading the data. As a result, the offset between 
different groups of cells may be very different so that prior art automatic 
offset or gain control, which estimates the mismatch from the previously 
received data, can not be applied. Methods to solve these difficulties in 
Flash memories have been discussed in, for example, 0. i. i. 0. In 
optical disc media, such as the popular Compact Disc, DVD, and Blu- 
ray disc, the retrieved signal depends on the dimensions of the written 
features and upon the quality of the light path, which may be obscured by 
fingerprints or scratches on the substrate. Fingerprints and scratches will 
result in rapidly varying offset and gain variations of the retrieved signal. 
Automatic gain and offset control in combination with dc-balanced codes 
are applied albeit at the cost of redundancy |^, and thus improvements 
to the art are welcome. 

Immink & Weber showed that detectors that use the Pearson dis- 
tance offer immunity to offset and gain mismatch. The Pearson distance 
can only be used for a set of codewords with special properties, called a 
Pearson set or Pearson code. Let 5 be a codebook of chosen g-ary code- 
words X = (xi, 0 : 2 , • • •, Xn) over the g-ary alphabet Q = {0,1,..., g — 1}, 
q >2, where n, the length of x, is a positive integer. Note that the alpha- 
bet symbols are to be treated as being just integers rather than elements 
ofZq. A Pearson code with maximum possible size given the parameters 
q and n is said to be optimal. 

In Section we set the stage with a description of Pearson distance 
detection and the properties of the constrained codes used in conjunction 
with it. Section gives a description of T-constrained codes, a type of 
code described in the prior art 13 , used in conjunction with the Pearson 
distance detector, while Section^ offers a general construction of optimal 
Pearson codes and a computation of their cardinalities. The rates of T- 
constrained codes will be compared with optimal rates of Pearson codes. 
In Section[^ we will describe our conclusions. 
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2 Preliminaries 


We use the shorthand notation av+h = (aui+ò, av^+h ^..., avn+h). In [^, 
the authors suppose a situation where the sent codeword, x, is received 
as the vector r = a{x + + 6, rj G M. Here a and h are unknown real 

numbers with a positive, called the gain and the (dc-) offset respectively. 
Moreover, u is an additive noise vector: u = (i/i,..., z/„), where z/j G M 
are noise samples from a zero-mean Gaussian distribution. Note that 
both gain and offset do not vary from symbol to symbol, but are the 
same for all n symbols. 

The receiver’s ignorance of the channel’s momentary gain and offset 
may lead to massive performance degradation as shown, for example, 
in when a traditional detector, such as threshold or maximum like- 
lihood detector, is used. In the prior art, various methods have been 
proposed to overcome this difliculty. In a hrst method, data reference, or 
‘training’, patterns are multiplexed with the user data in order to ‘teach’ 
the data detection circuitry the momentary values of the channel’s char- 
acteristics such as impulse response, gain, and offset. In a channel with 
unknown gain and offset, we may use two reference symbol values, where 
in each codeword, a hrst symbol is set equal to the lowest signal level 
and a second symbol equal to the highest signal level. The positions and 
amplitudes of the two reference symbols are known to the receiver. The 
receiver can straightforwardly measure the amplitude of the retrieved 
reference symbols, and normalize the amplitudes of the remaining sym- 
bols of the retrieved codeword before applying detection. Clearly, the 
redundancy of the method is two symbols per codeword. 

In a second prior art method, codes satisfying equal balance and en- 
ergy constraints [^ , which are immune to gain and offset mismatch, have 
been advocated. The redundancy of these codes, denoted by ro, is given 
by [§ ^ 

ro ^ logg n + loggiq'^ - 1) _4 + log^ 

In a recent contribution, Pearson distance detection is advocated since 
its redundancy is much less than that of balanced codes [^. The Pearson 
distance between the vectors x and x is dehned as follows. For a vector 
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define 



i=l 


(2) 


and 

n 

(3) 

i=l 

Note that ax is closely related to, bnt not the same as, the standard 
deviation of x. The (Pearson) correlation coefficient is defined by 


Px,x 


^x(^x 


and the Pearson distance is given by 


(4) 


6{x,x) = 1- px^x- (5) 

The Pearson distance and Pearson correlation coefficient are well-known 
concepts in statistics and cluster analysis. Note that we have \p^ ±1^1 
by a corollary of the Cauchy-Schwarz Inequality Section IV.4.6], which 
implies that 0 < 5{x, x) < 2. 

A minimum Pearson distance detector outputs the codeword 


Xo = argmin 5(r, a;). 
xes 

As the Pearson distance is translation and scale invariant, that is. 


5{x, x) = 5{ax + b,x). 


we conclude that the Pearson distance between the vectors x and x 
is independent of the channel’s gain or offset mismatch, so that, as a 
result, the error performance of the minimum Pearson distance detector 
is immune to gain and offset mismatch. This virtue implies, however, that 
the minimum Pearson distance detector cannot be used in conjunction 
with arbitrary codebooks, since 


5{r,x) = 5{r,y) 
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liy = CiX + C 2 , Ci, C 2 G M and Ci > 0. In other words, since a minimnm 
Pearson detector cannot distingnish between the words x and y = ci* + 
C 2 , the codewords mnst be taken from a codebook S C Q'^ that gnarantees 
nnambignons detection with the Pearson distance metric (|^. 

It is a well-known property of the Pearson correlation coefhcient, 
Px,x^ that 

Px,x — 1 

if and only if 

* = Ci + C2X, 

where the coefficients Ci and c^ > 0 are real numbers Section IV.4.6]. 
It is further immediate, see (|^, that the Pearson distance is undehned 
for codewords x with ax = 0, i.e., for multiples of the all-one vector. 
We coined the name Pearson code for a set of codewords that can be 
uniquely decoded by a minimum Pearson distance detector. We conclude 
that codewords in a Pearson code must satisfy two conditions, namely 

• Property A: \ì x G S then Ci + c^x ^ S for all Ci,C 2 G M with 

(ci, C 2 ) (0,1) and C 2 > 0. 

• Property B: x = {c,c,... ,c) ^ S for all c G M . 

In the remaining part of this paper, we will study constructions and prop- 
erties of Pearson codes. In particular, we are interested in Pearson codes 
that are optimal in the sense of having the largest number of codewords 
for given parameters n and q. We will commence with a description of 
prior art T-constrained codes, a hrst example of Pearson codes. 


3 T-constrained codes 

For integers T satisfying 1 <T < q, T-constrained codes (^, denoted by 
Sq,n{,CLi,..., ot), cousist of g-ary codewords of length n, where T preferred 
or reference symbols ai,... , 0 ^ G Q must each appear at least once in 
a codeword. Thus, each codeword, {xi,X 2 ,... ,Xn), in a T-constrained 
code satishes 


\{i ■. Xi = i}\ >0 for each j G {ai,..., ax}. 
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The number of g-ary sequences of length n, Nrp{q,n), where T distinct 
pre-dehned symbols occur at least once in every sequence, equals 

T 

NAq.n) = irZ,) (q - 0”. n>T. (6) 

i=0 

For example, we easily hnd for T = 1 and T = 2 that 

7Vi(g,n) = g"-(g-l)" (7) 

and 

N2{q,n) = q^-2{q-l)^ + {q-2)^. ( 8 ) 

Clearly, the number of T-constrained sequences is not affected by the 
choice of the specific T symbols we like to favor. 

For the binary case, g = 2, we simply find that iS2,n(0) is obtained by 
removing the all-‘l’ word from Q”, that iS 2 ,n(l) is obtained by removing 
the all-‘0’ word from Q"^, and that iS 2 ,n( 0 ,1) is obtained by removing both 
the all-‘l’ and all-‘0’ words from Q”, where Q = {0,1}. Hence, indeed, 

iVi(2,n) = 2*" - 1 


and 

N2{2,n) = 2^-2. 

The 2-constrained code iSg,n(0, g — 1) is a Pearson code as it satis- 
fies Properties A and B (^. There are more examples of 2-constrained 
sets that are Pearson codes, such as iSq^n(0,l). Note, however, that not 
all 2-constrained sets are Pearson codes. For example, iSq_n(0, 2) does 
not satisfy Property A if g > 5, since, e.g., both (0,1, 2,..., 2) and 
(0, 2, 4,..., 4) = 2 X (0,1,2,..., 2) are codewords 

It is obvious from Property B that the code iS2,n(0,1) of size 2"' — 2 
is the optimal binary Pearson code. For the ternary case, g = 3, it can 
easily be argued that iSs^n^O, 1), iS 3 ^n( 0 , 2 ), and iS 3 ^n(l, 2 ) are all optimal 
Pearson codes of size ?+ — -|- 1. 

However, for g > 3 the 2-constrained sets such as iSq,n(0,1), >Sq,n(0, q — 
1), and Sq^n{q — 2, g — 1), all of size N 2 {q,n), are not optimal Pearson 
codes, except when n = 2. For example, for g = 4, it can be easily 
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checked that the set iS 4 ^„( 0 , 3 ) U 53 ^„( 0 ,1,2) is a Pearson code. Its size 
equals 7 ^ 2 ( 4 , n) + Ns^S^n) = 4” — 3" — 2”+^ + 3, which is larger than 
7 ^ 2 ( 4 , n) and actually turns out to be the maximum possible size of any 
Pearson code for g = 4, as shown in the next section, where we will 
address the problem of constructing optimal Pearson codes for any value 
of q. 

4 Optimal Pearson codes 

For X = {xi,X 2 , ■ ■ ■ ,Xn) G Q”, let m{x) and M{x) denote the smallest 
and largest value, respectively, among the Xi. Furthermore, in case x is 
not the all-zero word, let GCD(a;) denote the greatest common divisor of 
the Xi- For integers n, g > 2, let Vq^n denote the set of all g-ary sequences 
X of length n satisfying the following properties: 

1. m{x) = 0; 

2. M{x) > 0; 

3. GCD(a;) = 1. 

Theorem 1. For any n,q>2, Vq^n is an optimal Pearson code. 

Proof. We will hrst show that Vq^n is a Pearson code. Property B is 
satished since any word in Vq^n contains at least one ‘ 0 ’ and at least 
one symbol unequal to ‘O’. It can be shown that Property A holds by 
supposing that x G Vq^n and x = Ci + c^x G Vq^n for some Ci,C 2 G M 
with C 2 > 0. Clearly ci = 0, since ci 7 ^ 0 implies that m{x) 7 ^ 0. Then, 
since X = C 2 X, we infer that GCD(a;) = c^xGCD^a;) = c^. Since, by 
dehnition, GCD(a;) = 1, we have c^ = 1 and conclude x = x, which 
proves that also Property A is satished. We conclude Vq^n is a Pearson 
code. 

We will now show that Vq^n is the greatest among all Pearson codes. 
To that end, let S be any g-ary Pearson code of length n. We map all 
a; G 5 to a; — m{x) and call the resulting code S'. Then, we map all 
words x' in S' to x'/GGD{x'). Note that both mappings are injective 
and that all words in the resulting code S" satisfy Properties 1-3. Hence, 
S" of size |iS| is a subset of Vq^n, which proves that Vq^n is optimal. □ 


7 



(9) 


From the definitions of T-constrained sets and Vq^n h follows that 

In the following snbsections, we will consider the cardinality and rednn- 
dancy of Tg,n, and compare these to the corresponding results for T- 
constrained codes. 

4.1 Cardinality 

In this subsection, we study the size Pq^n of Vq,n- From ([^ and the remark 
following (|^, we have 


N 2 {q,n) < Pq,n < Ni{q,n). (10) 

From Property B we have the trivial upper bound 

Pq,n <q'" -q, (11) 

which is tight in case q = 2 as indicated in Sectionj^ i.e., 

P2,n = 2^-2. ( 12 ) 

In order to present expressions for larger values of q, we first prove the 
following lemma. We define Pi,^ = 0. 

Lemma 1. For any n >2 and q > 3, 

(P,,n - P.-i,n) = g" - 2{q - 1)" + {q- 2)", (13) 

i=2 

i-l\q-l 

where the summation is over all integers i in the indicated range such 
that i — 1 is a divisor of q — 1. 

Proof. For each i such that 2 < i < q and i — 1 is a divisor of g — 1, 
we define Vi,n as the set of all Tary sequences y of length n satisfying 
m{y) = 0, M{y) = i — 1, and GCD(^) = 1. Let V denote the union of 
all these disjoint Vi,n- 


The mapping -0 from — 1) to V, defìned by dividing x G 

‘5g,n(0, g — 1) by GCD(a;), is a bijection. This follows by observing that, 
on one hand, '^(a;) is a unique member of T>(g_i)/GCD{a;)+i,rn while, on the 
other hand, any sequence in ^ G is the image of ((g — — l))y G 

‘5g,n(0, Q — 1) under 'ijj. 

Finally, the lemma follows by observing that |'Dj,n| = Pi,n — Pi-i,n and 

|5,,n(0,g-l)| =iV 2 (g,n) = g"-2(g-l)- + (g-2y. ’ ’ □ 


We thus have with (13) a recursive expression for Pq,n- Starting from 
the result for g = 2 in (12), we can hnd Pq^n for any n and q. Expressions 
for 2 < g < 8 of the size of optimal Pearson codes, Pq^n, are tabulated in 
Table [T} The next theorem offers a closed formula for the size of optimal 
Pearson codes, Pg,n. We start with a dehnition. 

For a positive integer d, the Mòbius function fi{d) is dehned 
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Chapter XVI] to be 0 if d is divisible by the square of a prime, otherwise 
fi{d) = (—1)^ where k is the number of (distinct) prime divisors of d. 


Theorem 2. Let n and q be positive integers. Let Pq^n be the cardinality 
of a q-ary Pearson code of length n. Then 


q-l 

Pq,n = h'id) 
d=l 


g -1 
d 


+ 1 - 


q-1 


d 

- 

see 
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-1 


(14) 


example) in our proof of Theorem 
Theorem 3. Let P : M —)• M and G : M —)■ M òe functions such that 


yx\ 

G(x) = Y. nnv 

d=l 


for all positive x. Then 


lx\ 

P{x) = '^MG{x/d). (15) 

d=i 
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Proof (of Theorem^. For a non-negative real nnniber x, define 

4 = {0,1,..., [xj} = z n [o,x]. 

Let Vx be the set of vectors of length n with entries in and with at 
least one zero entry and at least one non-zero entry. Define G{x) = |Ì4|. 
To determine G{x), note that there are 141” length n vectors with entries 
in 4, and we must exclude the all-zero vector and the (|4| ~ !)”■ vectors 
with no zero entries. Since |4| = L^J + 1, we find that 

G{x) = |4r - (141 - 1)" - 1 = (L4 + 1)” - L4" - 1- (16) 


For a positive integer d, let Vx^d be the set of vectors c G VJ, such that 
GCD(c) = d. Since c 7 ^ 0, we see that 1 < GCD(c) < maxjjci} < L4 
and so V} can be written as the disjoint union 

[xj 

Vx=[j Vx,d. 

d=l 

Moreover, \Vx,d\ = iVr/rf^l, since the map taking c G Vx^ to (l/d)c G 
Vx/d,i is a bijection. 

Define Fix) = lVj;i|, so Fix) is the number of vectors c G Vr such 
that GCD(c) = 1. Now, 


[xj [xj [a;J 

G{x) = |vr = |V,,r = \Vxid,i\ = 5^+(a:/d). 

d=\ d=l d=l 


So, by Theorem we 
from the fact that Pq^n 


deduce that (15) holds. Theorem |2 now follows 
= F{q — 1), by combining (15) and (16). □ 


After perusing Table it appears that for g > 4, Pq^n is roughly g” — 
{q — 1)”. An intuitive justification is that among the g” g-ary sequences 
of length n there are {q — 1 )” sequences that do not contain 0 , which is 
the most significant condition to avoid. All this is confirmed by the next 
corollary. 


Corollary 1. For any positive integer q, we have that 


P,,n = q^-{q-l)^ + 0{\q/2r) 


as n —>■ 00 . 
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Table 1: Size of optimal Pearson codes, Pq,m for 2 < g < 8. 


q 

P 

2 

2*" - 2 

3 

gn _ 2^+1 ^ X 

4 

A^-3n_ 2^+1 + 3 

5 

srx - 4n _ 3n + 2 

6 

gn _ sn _ 3n _ 2 « + 4 

7 

7” - 6" - 4*^ + 2*^ + 1 

8 

8n _ yn _ qn ^ 3 


Proof. The d = 1 term in the snm on the right hand side of (14) is 
g” —(g —1)”, and the absolnte values of remaining terms are each bounded 
by [g/2]", since 


L(g - l)/dj + 1 < L(g - 1)/2J + 1 = \q/2]. □ 

As discussed above, the 2-constrained codes 1) and 5g^„(0, q—l) 

are Pearson codes. Therefore, it is of interest to compare Pg^n with the 
cardinality N^^q^n) of 2-constrained codes. For g < 3, we simply have 
5g,n(0,1) = Pq,n, and thus N^^q^n) = Pqm However, for g > 4, we infer 
from (j^, i.e., N^iq^n) = g” — 2{q — l)"- + (g — 2)"-, and Corollary 1, i.e., 
Pq,n = q^ — {q — 1)"' + 0{\q/2]'^) that N^^q^n) < Pq,n, with a possible 
exception for very small values of n. For all g > 2, 


Pq,2 — ^^{q, 2) — 2 

(17) 

and it is not hard to show that 

q-l 

^9.3 = 6 5^0(j), 

(18) 


i=i 


where (j){j) is Euler’s totient function that counts the totatives of j, i.e., 
the positive integers less than or equal to j that are relatively prime to j. 

We have computed the cardinalities of Ni{q,n), N 2 {q,n), and Pq^n by 
invoking ([^, (|^, and the expressions in Tablej^ Tablej^lists the results 
of our computations for selected values of q and n. 
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Table 2: N^iq^n), Pq,n, and Ni{q,n) for selected values of q and n. 


n 

q 

N2{q,n) 

P 

q,n 

Ni{q,n) 

4 

4 

110 

146 

175 

4 

5 

194 

290 

369 

4 

6 

302 

578 

671 

5 

4 

570 

720 

781 

5 

5 

1320 

1860 

2101 

5 

6 

2550 

4380 

4651 

6 

4 

2702 

3242 

3367 

6 

5 

8162 

10802 

11529 

6 

6 

19502 

30242 

31031 

7 

4 

12138 

13944 

14197 

7 

5 

47544 

59556 

61741 

7 

6 

140070 

199500 

201811 


4.2 Redundancy 

As usual, the redundancy of a g-ary code C of length n is dehned by 
n — logq \C\. From Q, it follows that the redundancy of a 1-constrained 
code is 


ri = n- logg(g” - (g - 1)”) 



~ j ln(g), (19) 

for n sufhciently large, where the approximation follows from the well- 
known fact that ln(l -|- a) ~ a when a is close to 0. Similarly, from ([^ 
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we infer the redundancy of a 2-constrained code, namely 


r 2 = n - log^(g" - 2(g - 1)" + (g - 2)") 


= - logg 


1-2 





(20) 


for n sufficiently large. Since the 2-constrained code 5g,„(0,1) is optimal 
for q = 2,3, the expression for gives the minimum redundancy for any 
binary or ternary Pearson code. From Corollary 1, it follows for q > 4: 
that the redundancy of optimal Pearson codes equals 


rp = n- log, f,” - - 1)” + O ((4^) )) 



In conclusion, for sufficiently large n, we have 

rp = ^ 2ri 


( 21 ) 


( 22 ) 


if g = 2,3, while 


rp ri K, ^2/2 


(23) 


if g > 4. Figure shows, as an example, the redundancies r^, r^, and 
rp versus n for g = 8 (the quantity rp was computed using the expres- 
sion listed in Table[^. Note that the redundancy r^ decreases while the 
redundancy of prior art balanced codes, ro, see ([^, increases with in- 
creasing codeword length n. The curve ro versus n was not plotted in 
Figure[T]as the redundancy of balanced codes is much higher than that 
of Pearson codes. For example, an evaluation of ([^ shows that the re- 
dundancy r^ = 2.79 for q = S and n = 10, while rp = 0.147 for the same 
parameters. 
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Figure 1: Redundancy ri, r^, and rp versus n for g = 8. 


5 Conclusions 

We have studied sets of g-ary codewords of length n, coined Pearson 
codes, that can be detected unambiguously by a detector based on the 
Pearson distance. We have formulated the properties of codewords in 
Pearson codes. We have presented constructions of optimal Pearson codes 
and evaluated their cardinalities and redundancies. We conclude that, 
except for small values of q and/or n, the redundancy of optimal Pearson 
codes is almost the same as the redundancy of 1-constrained codes. 
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